Multivariate Plots by quality rank
Function to create Histogram plots by category variables
plot_hist_by_color <- function(x_str, by_str, bin_width, xmin, xmax, dx, ymin, ymax, dy) {ggplot(aes_string(x = x_str, color = by_str, fill = by_str), data = ww) + geom_histogram(binwidth = bin_width) + scale_x_continuous(limits = c(xmin, xmax), breaks = seq(xmin, xmax, dx)) + scale_y_continuous(limits = c(ymin, ymax), breaks = seq(ymin, ymax, dy))}
Function to create scatter plots by category variables
plot_scat_by_color <- function(y_str, x_str, by_str, ymin, ymax, dy) {ggplot(aes_string(y = y_str, x = x_str, color = by_str), data = ww) + geom_jitter(alpha = 1/4) + scale_y_continuous(limits = c(ymin, ymax), breaks = seq(ymin, ymax, dy))}
Function to create density plots by category variables
plot_density_by_color <- function(x_str, by_str) {ggplot(aes_string(x = x_str,color = by_str), data = ww) + geom_density(size = 1)}
Function to create box plots by category variables
plot_box_by_color <- function(y_str, x_str, by_str, ymin, ymax) {ggplot(aes_string(y=y_str, x=x_str, fill = by_str),data=ww)+geom_boxplot() +coord_cartesian(ylim= c(ymin, ymax))}
Residual sugar versus quality
p1 <- plot_hist_by_color(x_str = "residual.sugar", by_str = "quality.rank", bin_width = 0.0984, xmin = 0, xmax = 30, dx = 2, ymin = 0, ymax = 60, dy = 5)
p2 <- plot_scat_by_color(y_str = "residual.sugar", x_str = "quality", by_str = "quality.rank", ymin = 0, ymax = 20, dy = 5)
p3 <- plot_density_by_color(x_str = "residual.sugar", by_str = "quality.rank")
p4 <- plot_box_by_color(y_str = "residual.sugar", x_str = "quality.rank", by_str = "quality.rank", ymin = 0, ymax = 20)
suppressWarnings(grid.arrange(p1, p2, p3, p4, ncol=2))

Distributions are partly overlapped. The correlation between residual sugar and quality is negative. The median of low quality is larger than the others. The median decreases slowly with quality and thus wine quality does not change much with residual sugar.
Chlorides versus quality
p1 <- plot_hist_by_color(x_str = "chlorides", by_str = "quality.rank", bin_width = 0.003, xmin = 0, xmax = 0.3, dx = 0.05, ymin = 0, ymax = 550, dy = 50)
p2 <- plot_scat_by_color(y_str = "chlorides", x_str = "quality", by_str = "quality.rank", ymin = 0.015, ymax = 0.075, dy = 0.005)
p3 <- plot_density_by_color(x_str = "chlorides", by_str = "quality.rank")
p4 <- plot_box_by_color(y_str = "chlorides", x_str = "quality.rank", by_str = "quality.rank", ymin = 0.015, ymax = 0.075)
suppressWarnings(grid.arrange(p1,p2,p3,p4,ncol=2))

Distributions are separated a little bit. Correlation between chlorides and quality is negative. The median for lower quality rank is larger and decreases with quality. Thus higher quality rank contains lower chlorides.
Total sulfur dioxide versus quality
p1 <- plot_hist_by_color(x_str = "total.sulfur.dioxide", by_str = "quality.rank", bin_width = 5, xmin = 0, xmax = 300, dx = 50, ymin = 0, ymax = 300, dy = 50)
p2 <- plot_scat_by_color(y_str = "total.sulfur.dioxide", x_str = "quality", by_str = "quality.rank", ymin = 0, ymax = 300, dy = 50)
p3 <- plot_density_by_color(x_str = "total.sulfur.dioxide", by_str = "quality.rank")
p4 <- plot_box_by_color(y_str = "total.sulfur.dioxide", x_str = "quality.rank", by_str = "quality.rank", ymin = 0, ymax = 300)
suppressWarnings(grid.arrange(p1,p2,p3,p4,ncol=2))

Distributions for low quality ranks are separated from others and the distributions of medium and high quality ranks are partly overlapped. Correlation is negative. The median changes with quality slowly. Lower quality rank contains more total sulfur dioxide.
Mass density versus quality
p1 <- plot_hist_by_color(x_str = "mass.density", by_str = "quality.rank", bin_width = 0.0004, xmin = 0.98, xmax = 1.01, dx = 0.005, ymin = 0, ymax = 300, dy = 50)
p2 <- plot_scat_by_color(y_str = "mass.density", x_str = "quality", by_str = "quality.rank", ymin = 0.986, ymax = 1.004, dy = 0.002)
p3 <- plot_density_by_color(x_str = "mass.density", by_str = "quality.rank")
p4 <- plot_box_by_color(y_str = "mass.density", x_str = "quality.rank", by_str = "quality.rank", ymin = 0.986, ymax = 1.004)
suppressWarnings(grid.arrange(p1,p2,p3,p4,ncol=2))

Distributions for different quality ranks are separated from each other. Correlation between mass density and quality is negative. The median decreases with quality. Mass density is larger for lower quality rank.
Alcohol versus quality
p1 <- plot_hist_by_color(x_str = "alcohol", by_str = "quality.rank", bin_width = 0.1, xmin = 8, xmax = 15, dx = 1, ymin = 0, ymax = 250, dy = 50)
p2 <- plot_scat_by_color(y_str = "alcohol", x_str = "quality", by_str = "quality.rank", ymin = 8, ymax = 14, dy = 1)
p3 <- plot_density_by_color(x_str = "alcohol", by_str = "quality.rank")
p4 <- plot_box_by_color(y_str = "alcohol", x_str = "quality.rank", by_str = "quality.rank", ymin = 8, ymax = 14)
suppressWarnings(grid.arrange(p1,p2,p3,p4,ncol=2))

Distributions for different quality ranks are very well separated from each other. Correlation between alcohol and quality is positive and strong. The median increases with quality. Thus higher quality rank contains more alcohol and lower quality rank contains less alcohol.
Multivariate Plots of variable versus alcohol and by quality rank
Residual sugar
p1 <- plot_scat_by_color(y_str = "residual.sugar", x_str = "alcohol", by_str = "quality.rank", ymin = 0, ymax = 20, dy = 5) + facet_wrap(~ quality.rank, ncol = 3)
p2 <- plot_box_by_color(y_str = "residual.sugar", x_str = "alcohol.degree", by_str = "quality.rank", ymin = 0, ymax = 20) + facet_wrap(~ quality.rank, ncol = 3)
suppressWarnings(grid.arrange(p1, p2, ncol=1))

Correlation between the residual sugar and alcohol is negative and strong for all quality ranks. The residual sugar decreases (nonlinearly) fast with alcohol from low alcohol degree to medium alcohol degree, but decreases slowly with alcohol from medium alcohol degree to high alcohol degree.
Chlorides
p1 <- plot_scat_by_color(y_str = "chlorides", x_str = "alcohol", by_str = "quality.rank", ymin = 0.01, ymax = 0.07, dy = 0.01) + facet_wrap(~ quality.rank, ncol = 3)
p2 <- plot_box_by_color(y_str = "chlorides", x_str = "alcohol.degree", by_str = "quality.rank", ymin = 0.01, ymax = 0.07) + facet_wrap(~ quality.rank, ncol = 3)
suppressWarnings(grid.arrange(p1, p2, ncol=1))

Correlation between the chlorides and alcohol is negative and strong for all quality ranks. The chlorides decrease nearly linearly with alcohol for all quality ranks.
Total sulfur dioxide
p1 <- plot_scat_by_color(y_str = "total.sulfur.dioxide", x_str = "alcohol", by_str = "quality.rank", ymin = 50, ymax = 250, dy = 50) + facet_wrap(~ quality.rank, ncol = 3)
p2 <- plot_box_by_color(y_str = "total.sulfur.dioxide", x_str = "alcohol.degree", by_str = "quality.rank", ymin = 50, ymax = 250) + facet_wrap(~ quality.rank, ncol = 3)
suppressWarnings(grid.arrange(p1, p2, ncol=1))

Correlation between the total sulfur dioxide and alcohol is negative and strong for all quality ranks. The total sulfur dioxide decreases approximately linearly with alcohol for all quality ranks.
Mass density
p1 <- plot_scat_by_color(y_str = "mass.density", x_str = "alcohol", by_str = "quality.rank", ymin = 0.987, ymax = 1.003, dy = 0.002) + facet_wrap(~ quality.rank, ncol = 3)
p2 <- plot_box_by_color(y_str = "mass.density", x_str = "alcohol.degree", by_str = "quality.rank", ymin = 0.987, ymax = 1.003) + facet_wrap(~ quality.rank, ncol = 3)
suppressWarnings(grid.arrange(p1, p2, ncol=1))

Correlation between the mass density and alcohol is negative and very strong for all quality ranks. The mass density decreases almost linearly with alcohol for all quality ranks.
Multivariate Plots of variables versus mass density and by quality rank
Residual sugar
p1 <- plot_scat_by_color(y_str = "residual.sugar", x_str = "mass.density", by_str = "quality.rank", ymin = 0, ymax = 20, dy = 5) + scale_x_continuous(limits = c(0.986,1.004), breaks=seq(0.986,1.004,0.004)) + facet_wrap(~ quality.rank, ncol = 3)
p2 <- plot_box_by_color(y_str = "residual.sugar", x_str = "mass.density.level", by_str = "quality.rank", ymin = 0, ymax = 20) + facet_wrap(~ quality.rank, ncol = 3)
suppressWarnings(grid.arrange(p1, p2, ncol=1))

Correlation between the residual sugar and mass density is positive and strong for all quality ranks. The residual sugar increases nearly linearly with mass density for all quality ranks.
Chlorides
p1 <- plot_scat_by_color(y_str = "chlorides", x_str = "mass.density", by_str = "quality.rank", ymin = 0.01, ymax = 0.07, dy = 0.01) + scale_x_continuous(limits = c(0.986,1.004), breaks=seq(0.986,1.004,0.004)) + facet_wrap(~ quality.rank, ncol = 3)
p2 <- plot_box_by_color(y_str = "chlorides", x_str = "mass.density.level", by_str = "quality.rank", ymin = 0.01, ymax = 0.07) + facet_wrap(~ quality.rank, ncol = 3)
suppressWarnings(grid.arrange(p1, p2, ncol=1))

Correlation between chlorides and mass density is positive. The chlorides increase nearly linearly with mass density for all quality ranks.
Total sulfur dioxide
p1 <- plot_scat_by_color(y_str = "total.sulfur.dioxide", x_str = "mass.density", by_str = "quality.rank", ymin = 50, ymax = 250, dy = 50) + scale_x_continuous(limits = c(0.986,1.004), breaks=seq(0.986,1.004,0.004)) + facet_wrap(~ quality.rank, ncol = 3)
p2 <- plot_box_by_color(y_str = "total.sulfur.dioxide", x_str = "mass.density.level", by_str = "quality.rank", ymin = 50, ymax = 250) + facet_wrap(~ quality.rank, ncol = 3)
suppressWarnings(grid.arrange(p1, p2, ncol=1))

Correlation between total sulfur dioxide and mass density is positive and nearly linear. The total sulfur dioxide increases nearly linearly with mass density level for all quality ranks.
Alcohol
p1 <- plot_scat_by_color(y_str = "alcohol", x_str = "mass.density", by_str = "quality.rank", ymin = 8, ymax = 15, dy = 1) + scale_x_continuous(limits = c(0.986,1.004), breaks=seq(0.986,1.004,0.004)) + facet_wrap(~ quality.rank, ncol = 3)
p2 <- plot_box_by_color(y_str = "alcohol", x_str = "mass.density.level", by_str = "quality.rank", ymin = 8, ymax = 15) + facet_wrap(~ quality.rank, ncol = 3)
suppressWarnings(grid.arrange(p1, p2, ncol=1))

Correlation between alcohol and mass density is negative and nearly linear for all quality ranks. The alcohol decreases with mass density nearly linearly for all quality ranks.
Multivariate scatter plots of strongly-correlated variables by quality ranks
Function to create Histogram plots by category variables
plot_scat_multi_var_by_color <- function(x_str, y_str, by_str, ymin, ymax, dy, xmin, xmax, dx) {ggplot(aes_string(x = x_str, y = y_str, color = by_str), data = ww) + geom_point(alpha = 1/2, size = 3, position = 'jitter') + scale_y_continuous(limits = c(ymin, ymax), breaks = seq(ymin, ymax, dy)) + scale_x_continuous(limits = c(xmin, xmax), breaks = seq(xmin, xmax, dx)) + scale_color_brewer(type = "qual", guide = guide_legend(title = 'Quality rank', reverse = F,override.aes = list(alpha = 1, size = 3)))}
Alcohol versus mass density and by quality ranks
p1 <- plot_scat_multi_var_by_color(x_str = "mass.density", y_str = "alcohol", by_str = "quality.rank", ymin = 8, ymax = 14, dy = 1, xmin = 0.985, xmax = 1.005, dx = 0.005)
suppressWarnings(grid.arrange(p1, ncol=1))

Correlation between alcohol and mass density is negative, strong and nearly linear. Higher quality rank contains higher alcohol and has less mass density.
Residual sugar versus mass density and by quality ranks
p1 <- plot_scat_multi_var_by_color(x_str = "mass.density", y_str = "residual.sugar", by_str = "quality.rank", ymin = 0, ymax = 25, dy = 5, xmin = 0.985, xmax = 1.005, dx = 0.005)
suppressWarnings(grid.arrange(p1, ncol=1))

Correlation between residual sugar and mass density is positive, strong and approximately linear. Higher quality rank contains higher residual sugar or has less mass density.
Alcohol versus residual sugar and by quality ranks
p1 <- plot_scat_multi_var_by_color(x_str = "residual.sugar", y_str = "alcohol", by_str = "quality.rank", ymin = 8, ymax = 14, dy = 1, xmin = 0, xmax = 20, dx = 5)
suppressWarnings(grid.arrange(p1, ncol=1))

Correlation between alcohol and residual sugar is negative. Higher quality rank has higher alcohol and lower residual sugar.
Total sulfur dioxide versus free sulfur dioxide and by quality ranks
p1 <- plot_scat_multi_var_by_color(x_str = "free.sulfur.dioxide", y_str = "total.sulfur.dioxide", by_str = "quality.rank", ymin = 0, ymax = 250, dy = 50, xmin = 0, xmax = 80, dx = 20)
suppressWarnings(grid.arrange(p1, ncol=1))

Correlation between free sulfur dioxide and total sulfur dioxide is positive and nearly linear. The distributions for different quality ranks cannot be well separated.
Total sulfur dioxide versus mass density and by quality ranks
p1 <- plot_scat_multi_var_by_color(x_str = "mass.density", y_str = "total.sulfur.dioxide", by_str = "quality.rank", ymin = 0, ymax = 250, dy = 50, xmin = 0.985, xmax = 1.005, dx = 0.005)
suppressWarnings(grid.arrange(p1, ncol=1))

Correlation between total sulfur dioxide and mass density is positive and approximately linear. Higher quality rank has less mass density and less total sulfur dioxide.
Alcohol versus quality and by quality ranks
p1 <- plot_scat_multi_var_by_color(x_str = "quality", y_str = "alcohol", by_str = "quality.rank", ymin = 8, ymax = 15, dy = 1, xmin = 3, xmax = 9, dx = 1)
suppressWarnings(grid.arrange(p1, ncol=1))

Correlation between alcohol and quality is positive and nearly linear. Higher quality rank contains higher alcohol.
Mass density versus quality and by quality ranks
p1 <- plot_scat_multi_var_by_color(x_str = "quality", y_str = "mass.density", by_str = "quality.rank", ymin = 0.985, ymax = 1.005, dy = 0.005, xmin = 3, xmax = 9, dx = 1)
suppressWarnings(grid.arrange(p1, ncol=1))

Correlation between mass density and quality is negative and nearly linear. Higher quality rank has lower mass density.
Residual sugar versus quality and by quality ranks
p1 <- plot_scat_multi_var_by_color(x_str = "quality", y_str = "residual.sugar", by_str = "quality.rank", ymin = 0, ymax = 30, dy = 5, xmin = 3, xmax = 9, dx = 1)
suppressWarnings(grid.arrange(p1, ncol=1))

Correlation between residual sugar and quality is negative. Higher quality contains less residual sugar.
Alcohol versus chlorides and by quality ranks
p1 <- plot_scat_multi_var_by_color(x_str = "alcohol", y_str = "chlorides", by_str = "quality.rank", ymin = 0, ymax = 0.08, dy = 0.02, xmin = 8, xmax = 14, dx = 1)
suppressWarnings(grid.arrange(p1, ncol=1))

Correlation between chlorides and alcohol is negative and approximately linear. Higher quality rank contains higher alcohol and lower chlorides.
Total sulfur dioxide versus alcohol and by quality ranks
p1 <- plot_scat_multi_var_by_color(x_str = "alcohol", y_str = "total.sulfur.dioxide", by_str = "quality.rank", ymin = 0, ymax = 250, dy = 50, xmin = 8, xmax = 15, dx = 1)
suppressWarnings(grid.arrange(p1, ncol=1))

Correlation between total sulfur dioxide and alcohol is negative. Higher quality rank contains higher alcohol and lower total sulfur dioxide.
Total sulfur dioxide versus residual sugar and by quality ranks
p1 <- plot_scat_multi_var_by_color(x_str = "residual.sugar", y_str = "total.sulfur.dioxide", by_str = "quality.rank", ymin = 0, ymax = 250, dy = 50, xmin = 0, xmax = 20, dx = 5)
suppressWarnings(grid.arrange(p1, ncol=1))

Correlation between total sulfur dioxide and residual sugar is positive. The distributions for different quality ranks cannot be well separated.
Comparison of scatter plots of most strong correlations
p1 <- plot_scat_multi_var_by_color(x_str = "mass.density", y_str = "alcohol", by_str = "quality.rank", ymin = 8, ymax = 14, dy = 1, xmin = 0.985, xmax = 1.005, dx = 0.005)
p2 <- plot_scat_multi_var_by_color(x_str = "mass.density", y_str = "residual.sugar", by_str = "quality.rank", ymin = 0, ymax = 20, dy = 5, xmin = 0.985, xmax = 1.005, dx = 0.005)
suppressWarnings(grid.arrange(p1, p2, ncol=2))

Correlation between alcohol and mass density is negative, strong and nearly linear. Higher quality rank contains higher alcohol and has less mass density.
Correlation between residual sugar and mass density is positive, strong and nearly linear. Lower quality rank contains lower residual sugar or has larger mass density. The residual sugar of higher quality rank cannot be separated from that of medium quality rank.
Mass density versus alcohol by quality ranks.
p1 <- plot_scat_multi_var_by_color(x_str = "mass.density", y_str = "alcohol", by_str = "quality.rank", ymin = 8, ymax = 14, dy = 1, xmin = 0.985, xmax = 1.005, dx = 0.005)
p2 <- plot_box_by_color(y_str = "mass.density", x_str = "alcohol.degree", by_str = "quality.rank", ymin = 0.985, ymax = 1.005)
suppressWarnings(grid.arrange(p1, p2, ncol=2))

Lower quality ranks have higher mass density and lower alcohol, and higher quality ranks have lower mass density and higher alcohol.
In medium and high alcohol wines, the mass density decreases with quality and thus the mass density of higher quality is smaller. However, in low alcohol wines, the mass density increases with quality and thus the mass density of higher quality is larger.
Multivariate plots of crossing correlations
Function to create scatter plots of crossing correlations
plot_scat_multi_var_cross <- function(y_str, x_str, by_str, ymin, ymax, dy){ggplot(aes_string(y = y_str, x = x_str, color = by_str), data = ww) + geom_jitter(alpha = 1/2) + scale_y_continuous(limits = c(ymin, ymax), breaks=seq(ymin, ymax, dy)) + facet_wrap(~ quality.rank, ncol = 3)}
Function to create box plots of crossing correlations
plot_box_multi_var_cross <- function(y_str, x_str, by_str, ymin, ymax){ggplot(aes_string(y = y_str, x = x_str, color = by_str), data = ww) + geom_boxplot() + coord_cartesian(ylim = c(ymin, ymax)) + facet_wrap(~ quality.rank, ncol=3)}
Residual sugar vesus alcohol by mass density and quality
p1 <- plot_scat_multi_var_cross(y_str="residual.sugar", x_str = "alcohol", by_str = "mass.density.level", ymin = 0, ymax = 20, dy =5)
p2 <- plot_box_multi_var_cross(y_str = "residual.sugar", x_str = "alcohol.degree", by_str = "mass.density.level", ymin = 0, ymax = 20)
suppressWarnings(grid.arrange(p1, p2, ncol=1))

In all quality ranks, low mass density almost always corresponds to high alcohol and low residual sugar, and high mass density almost always corresponds to low alcohol and high residual sugar. Most low quality ranks have high residual sugar, low alcohol, and high mass density. In medium quality rank, the number of low mass density wines is quite close to that of medium and high mass density wines. It seems that more high quality ranks have low mass density, high alcohol, and low residual sugar.
In all quality ranks and for all alcohol degrees, the residual sugar increases with mass density monotonically.
Using these plots one can learn some physicochemical characteristics of white wines. For example, the mass density and residual sugar are low for most of high quality and high alcohol wines.
Total sulfur dioxide vesus alcohol by mass density and quality
p1 <- plot_scat_multi_var_cross(y_str="total.sulfur.dioxide", x_str = "alcohol", by_str = "mass.density.level", ymin = 0, ymax = 300, dy =50)
p2 <- plot_box_multi_var_cross(y_str = "total.sulfur.dioxide", x_str = "alcohol.degree", by_str = "mass.density.level", ymin = 0, ymax = 300)
suppressWarnings(grid.arrange(p1, p2, ncol=1))

For all quality ranks, low mass density almost always corresponds to high alcohol and low total sulfur dioxide, and high mass density almost always corresponds to low alcohol and high total sulfur dioxide.
In medium quality rank the total sulfur dioxide increases with mass density monotonically for all alcohol degrees. In low quality rank, the total sulfur dioxide increases monotonically with mass density only for medium alcohol but does not change monotonically for low and high alcohol. In high quality rank, the total sulfur dioxide does not change monotonically.
Chlorides vesus alcohol by mass density and quality
p1 <- plot_scat_multi_var_cross(y_str="chlorides", x_str = "alcohol", by_str = "mass.density.level", ymin = 0, ymax = 0.1, dy =0.02)
p2 <- plot_box_multi_var_cross(y_str = "chlorides", x_str = "alcohol.degree", by_str = "mass.density.level", ymin = 0, ymax = 0.1)
suppressWarnings(grid.arrange(p1, p2, ncol=1))

For all quality ranks, low mass density almost always corresponds to high alcohol and low chlorides, and high mass density almost always corresponds to low alcohol and high chlorides.
In medium alcohol degree, the chlorides increase with mass density monotonically for all quality ranks. In low alcohol degree and high alcohol degree, the chlorides do not change monotonically with mass density.